46 research outputs found
FakeLocator: Robust Localization of GAN-Based Face Manipulations
Full face synthesis and partial face manipulation by virtue of the generative
adversarial networks (GANs) and its variants have raised wide public concerns.
In the multi-media forensics area, detecting and ultimately locating the image
forgery has become an imperative task. In this work, we investigate the
architecture of existing GAN-based face manipulation methods and observe that
the imperfection of upsampling methods therewithin could be served as an
important asset for GAN-synthesized fake image detection and forgery
localization. Based on this basic observation, we have proposed a novel
approach, termed FakeLocator, to obtain high localization accuracy, at full
resolution, on manipulated facial images. To the best of our knowledge, this is
the very first attempt to solve the GAN-based fake localization problem with a
gray-scale fakeness map that preserves more information of fake regions. To
improve the universality of FakeLocator across multifarious facial attributes,
we introduce an attention mechanism to guide the training of the model. To
improve the universality of FakeLocator across different DeepFake methods, we
propose partial data augmentation and single sample clustering on the training
images. Experimental results on popular FaceForensics++, DFFD datasets and
seven different state-of-the-art GAN-based face generation methods have shown
the effectiveness of our method. Compared with the baselines, our method
performs better on various metrics. Moreover, the proposed method is robust
against various real-world facial image degradations such as JPEG compression,
low-resolution, noise, and blur.Comment: 16 pages, accepted to IEEE Transactions on Information Forensics and
Securit
Among Us: Adversarially Robust Collaborative Perception by Consensus
Multiple robots could perceive a scene (e.g., detect objects) collaboratively
better than individuals, although easily suffer from adversarial attacks when
using deep learning. This could be addressed by the adversarial defense, but
its training requires the often-unknown attacking mechanism. Differently, we
propose ROBOSAC, a novel sampling-based defense strategy generalizable to
unseen attackers. Our key idea is that collaborative perception should lead to
consensus rather than dissensus in results compared to individual perception.
This leads to our hypothesize-and-verify framework: perception results with and
without collaboration from a random subset of teammates are compared until
reaching a consensus. In such a framework, more teammates in the sampled subset
often entail better perception performance but require longer sampling time to
reject potential attackers. Thus, we derive how many sampling trials are needed
to ensure the desired size of an attacker-free subset, or equivalently, the
maximum size of such a subset that we can successfully sample within a given
number of trials. We validate our method on the task of collaborative 3D object
detection in autonomous driving scenarios
TFormer: A Transmission-Friendly ViT Model for IoT Devices
Deploying high-performance vision transformer (ViT) models on ubiquitous
Internet of Things (IoT) devices to provide high-quality vision services will
revolutionize the way we live, work, and interact with the world. Due to the
contradiction between the limited resources of IoT devices and
resource-intensive ViT models, the use of cloud servers to assist ViT model
training has become mainstream. However, due to the larger number of parameters
and floating-point operations (FLOPs) of the existing ViT models, the model
parameters transmitted by cloud servers are large and difficult to run on
resource-constrained IoT devices. To this end, this paper proposes a
transmission-friendly ViT model, TFormer, for deployment on
resource-constrained IoT devices with the assistance of a cloud server. The
high performance and small number of model parameters and FLOPs of TFormer are
attributed to the proposed hybrid layer and the proposed partially connected
feed-forward network (PCS-FFN). The hybrid layer consists of nonlearnable
modules and a pointwise convolution, which can obtain multitype and multiscale
features with only a few parameters and FLOPs to improve the TFormer
performance. The PCS-FFN adopts group convolution to reduce the number of
parameters. The key idea of this paper is to propose TFormer with few model
parameters and FLOPs to facilitate applications running on resource-constrained
IoT devices to benefit from the high performance of the ViT models.
Experimental results on the ImageNet-1K, MS COCO, and ADE20K datasets for image
classification, object detection, and semantic segmentation tasks demonstrate
that the proposed model outperforms other state-of-the-art models.
Specifically, TFormer-S achieves 5% higher accuracy on ImageNet-1K than
ResNet18 with 1.4 fewer parameters and FLOPs.Comment: IEEE Transactions on Parallel and Distributed System
Seed Feature Maps-based CNN Models for LEO Satellite Remote Sensing Services
Deploying high-performance convolutional neural network (CNN) models on
low-earth orbit (LEO) satellites for rapid remote sensing image processing has
attracted significant interest from industry and academia. However, the limited
resources available on LEO satellites contrast with the demands of
resource-intensive CNN models, necessitating the adoption of ground-station
server assistance for training and updating these models. Existing approaches
often require large floating-point operations (FLOPs) and substantial model
parameter transmissions, presenting considerable challenges. To address these
issues, this paper introduces a ground-station server-assisted framework. With
the proposed framework, each layer of the CNN model contains only one learnable
feature map (called the seed feature map) from which other feature maps are
generated based on specific rules. The hyperparameters of these rules are
randomly generated instead of being trained, thus enabling the generation of
multiple feature maps from the seed feature map and significantly reducing
FLOPs. Furthermore, since the random hyperparameters can be saved using a few
random seeds, the ground station server assistance can be facilitated in
updating the CNN model deployed on the LEO satellite. Experimental results on
the ISPRS Vaihingen, ISPRS Potsdam, UAVid, and LoveDA datasets for semantic
segmentation services demonstrate that the proposed framework outperforms
existing state-of-the-art approaches. In particular, the SineFM-based model
achieves a higher mIoU than the UNetFormer on the UAVid dataset, with 3.3x
fewer parameters and 2.2x fewer FLOPs.Comment: 11 page
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Over the past decade, Artificial Intelligence (AI) has had great success
recently and is being used in a wide range of academic and industrial fields.
More recently, LLMs have made rapid advancements that have propelled AI to a
new level, enabling even more diverse applications and industrial domains with
intelligence, particularly in areas like software engineering and natural
language processing. Nevertheless, a number of emerging trustworthiness
concerns and issues exhibited in LLMs have already recently received much
attention, without properly solving which the widespread adoption of LLMs could
be greatly hindered in practice. The distinctive characteristics of LLMs, such
as the self-attention mechanism, extremely large model scale, and
autoregressive generation schema, differ from classic AI software based on CNNs
and RNNs and present new challenges for quality analysis. Up to the present, it
still lacks universal and systematic analysis techniques for LLMs despite the
urgent industrial demand. Towards bridging this gap, we initiate an early
exploratory study and propose a universal analysis framework for LLMs, LUNA,
designed to be general and extensible, to enable versatile analysis of LLMs
from multiple quality perspectives in a human-interpretable manner. In
particular, we first leverage the data from desired trustworthiness
perspectives to construct an abstract model as an auxiliary analysis asset,
which is empowered by various abstract model construction methods. To assess
the quality of the abstract model, we collect and define a number of evaluation
metrics, aiming at both abstract model level and the semantics level. Then, the
semantics, which is the degree of satisfaction of the LLM w.r.t. the
trustworthiness perspective, is bound to and enriches the abstract model with
semantics, which enables more detailed analysis applications for diverse
purposes.Comment: 44 pages, 9 figure